Week 7.3 - Visualization with AI

What We'll Cover

AI can now generate a complete data visualisation from a single natural-language prompt. You describe what you want to see, and within seconds you have a chart. This is genuinely powerful — and genuinely dangerous. The speed at which AI produces plots means you can explore data faster than ever before, but it also means you can publish misleading figures faster than ever before.

This session covers what AI visualisation tools can actually do, the specific ways their default outputs violate good visualisation principles, how to work with AI to audit and refine figures before they reach a reader, and why accessibility is not optional. By the end, you will have a practical five-step workflow for going from AI-generated draft to publication-quality figure — increasingly in conversation with the AI rather than after it.

🤖 What AI Can Generate

The landscape of AI-assisted visualisation has expanded rapidly. Several tools now let you describe a chart in plain English and receive working code or a rendered image in return. The underlying approach varies, but the result is the same: you no longer need to memorise matplotlib syntax or ggplot grammar to produce a chart. You do, however, still need to know whether the chart is any good.

Claude Code (or OpenAI Codex) with matplotlib

Claude can write and execute Python code directly, including full matplotlib, seaborn, and plotly visualisations. You describe what you want, provide your data (or point to a file), and Claude generates the code, runs it, and shows you the result — all in one conversation.

Full control over every aspect of the plot via Python code
Can iterate: "make the axis labels larger," "switch to a log scale," "add error bars"
Supports any Python plotting library (matplotlib, seaborn, plotly, altair)
You can inspect and modify the generated code directly
Best for researchers who want to learn plotting alongside using it

ChatGPT Code Interpreter

OpenAI's Code Interpreter (available in ChatGPT Plus) lets you upload data files and ask for visualisations in natural language. It runs Python in a sandboxed environment, produces charts, and lets you download the results. The experience is similar to Claude Code but runs in a browser-based sandbox.

Upload CSV, Excel, or other data files directly
Generates and executes Python plotting code automatically
Good for quick exploratory analysis of uploaded datasets
Limited to the libraries available in its sandbox environment
Less transparent about the code it runs unless you ask to see it

LIDA (Language-Independent Data Analysis)

LIDA, developed by Microsoft Research and introduced by Dibia (2023) , is a research tool that automates the entire visualisation pipeline: data summarisation, goal generation, chart creation, and even infographic styling. It uses LLMs to generate visualisation code from natural-language descriptions of data.

Automatically suggests visualisation goals based on your data
Generates multiple chart options for each goal
Includes an infographic module for styled output
Open-source and available on GitHub
Represents the research frontier of automated visualisation

Google Colab AI

Google Colab now includes AI-assisted code generation that can help write plotting code within Jupyter notebooks. You type a comment describing what you want, and Colab suggests the code to produce it. This integrates AI assistance into a workflow many researchers already use.

Works inside the familiar Jupyter/Colab notebook environment
Suggestions appear inline as you write comments or code
Free tier available (with usage limits on AI features)
Natural fit for researchers already using Colab for data analysis
Less conversational than Claude or ChatGPT — more like autocomplete for code

📌 What All These Tools Share

Every AI visualisation tool ultimately generates code (usually Python) that calls a plotting library. The AI is not drawing pixels — it is writing instructions for matplotlib, seaborn, plotly, or a similar library. This means the output is only as good as the code the AI writes, and that code reflects the defaults and conventions baked into those libraries. Understanding this is key to understanding why AI-generated charts so often need fixing.

🚀 A Capability Worth Knowing About

Modern agentic tools — particularly Claude Code — can view the plots they generate. When they produce a chart, they can inspect the rendered image, identify problems ("the y-axis starts at 94 which exaggerates the difference," "the legend overlaps the data"), and revise the code accordingly. If you explicitly ask Claude Code to critique its own visualisation for publication quality, colourblind safety, and clarity, it will often catch many of the issues described in this lesson and fix them autonomously. This does not eliminate your need to check the final output — but it does mean your starting point can be substantially better than raw matplotlib defaults.

⚠️ Good Visualisation Principles AI Often Violates (unless you prompt it to look out for them)

AI tools are trained on code from the internet, which means they reproduce the most common plotting patterns — not the best ones. The defaults in matplotlib and seaborn were designed for quick exploratory analysis, not for publication. When AI generates a chart, it almost always produces something that looks reasonable at first glance but violates one or more principles of effective data communication. Here are the six most common problems.

Misleading Axes

AI-generated charts frequently use axis ranges that exaggerate or minimise differences in the data. A bar chart with a y-axis starting at 95 instead of 0 can make a 2% difference look like a 50% difference. Conversely, an axis range that is too wide can flatten meaningful variation into invisibility.

Bar charts should almost always start at zero
Line charts have more flexibility, but truncated axes must be flagged
AI rarely considers whether the axis range is appropriate for the story the data tells
Dual y-axes are another common AI default that can mislead readers into seeing correlations that do not exist

Poor Colour Choices

Default colour palettes in most plotting libraries were not designed with colourblind readers in mind, and AI tools rarely switch away from these defaults. The classic matplotlib blue-orange-green palette is problematic for the roughly 8% of men and 0.5% of women with colour vision deficiency. Beyond accessibility, AI often uses too many colours or applies colour without informational purpose.

Default palettes fail common colourblindness tests
AI uses colour for decoration rather than information encoding
Too many categories in one colour scheme creates confusion
No consideration of whether the chart will be printed in greyscale

Overplotting

When AI generates scatter plots or dense line charts, it rarely accounts for overlapping data points. A scatter plot with 10,000 points where most overlap into an opaque blob tells you nothing about the distribution underneath. AI tools default to plotting every point at full opacity, obscuring the very patterns you are trying to reveal.

Scatter plots with many points need transparency (alpha), jittering, or density plots
Line charts with many series become unreadable spaghetti
AI defaults to full opacity and no jitter
Hexbin plots, contour plots, or violin plots are often better choices that AI rarely suggests on its own

Wrong Chart Type

AI tends to reach for the chart type most commonly associated with a keyword rather than the type best suited to the data. Ask for "a comparison" and you will likely get a bar chart, even when a dot plot, slope chart, or small multiples display would be more effective. Ask for "a trend" and you get a line chart, even when the data is not continuous.

Pie charts for more than 3–4 categories (humans are poor at comparing angles)
3D charts that add no information but obscure patterns
Line charts for categorical or discontinuous data
Stacked bar charts when the middle segments are impossible to compare

Missing Context

A chart without proper labels, units, a clear title, and source attribution is a chart that cannot be interpreted. AI-generated plots routinely omit axis units, use generic titles like "Data Plot," leave out sample sizes, and provide no indication of statistical uncertainty. A figure in a paper must stand alone — a reader should understand it without reading the surrounding text.

Axis labels missing units (is that kilometres, miles, or arbitrary units?)
No indication of sample size or data source
Generic or missing titles that do not describe the finding
No error bars, confidence intervals, or uncertainty indicators

Chartjunk

Edward Tufte coined the term "chartjunk" for visual elements that do not convey information: gratuitous gridlines, heavy borders, decorative backgrounds, 3D effects, and redundant legends. AI-generated charts often include all of these because the training data is full of them. Every non-data element in a chart competes for the reader's attention and should earn its place.

Heavy gridlines that dominate the data
Borders and boxes around every element
Legends that duplicate information already in the axis labels
Background colours, gradient fills, and shadow effects that add nothing

⚠️ The Core Problem

AI tools optimise for producing a chart that looks complete, not a chart that communicates accurately. A chart can be technically correct — the data points are in the right places — and still be misleading because of how it is framed. The visual choices (axis range, colour, chart type, annotations) are as important as the data itself, and these are exactly the choices AI handles poorly by default.

⚖️ When AI Visualisation Helps vs. When It Misleads

The gap between an AI-generated draft plot and a publication-quality figure is not trivial. Understanding where that gap lies — and how easy it is to miss — is essential for any researcher using AI to create visualisations.

The Default Settings Trap

The most dangerous aspect of AI-generated visualisations is that they look professional enough to use immediately. A matplotlib chart with default settings has clean lines, a white background, and properly rendered text. It looks like a finished product. But "looks finished" and "communicates accurately" are different things.

Consider what happens when you ask an AI to plot quarterly revenue data. The AI will likely produce a clean bar chart with sensible colours and axis labels. What it will not do, unless you specifically ask, is:

Adjust for inflation if the data spans multiple years
Start the y-axis at zero (it may auto-scale to exaggerate trends)
Add context about what drove changes (annotations for key events)
Consider whether a line chart would better show the trend
Include uncertainty or variance indicators
Choose colours that work in greyscale for journal printing

The trap is that the chart is technically correct but communicatively incomplete. Researchers who accept AI defaults without review risk publishing figures that mislead readers — not through fabrication, but through omission and poor framing.

Publication Quality Is a Different Standard

Journal reviewers and readers expect figures that meet specific standards: vector formats (PDF or SVG) rather than rasterised PNG, consistent font sizes across all figures in a paper, colour palettes that work for colourblind readers, axis labels with units, informative captions, and appropriate statistical annotations.

AI tools do not produce this by default — but modern agentic tools like Claude Code can get remarkably close when prompted well. Claude Code can generate a plot, view the output itself , critique what it sees ("the axis labels are too small," "this palette is not colourblind-safe"), and iterate on the code accordingly. This is a genuine capability leap over earlier tools. With good prompting — explicitly specifying colourblind-safe palettes, vector export, accessible fonts, and annotation requirements upfront — you can often reach near-publication quality in a single conversation.

That said, there are things AI cannot judge on your behalf: whether this chart type is the right choice for your argument, whether the framing is honest, whether the figure will stand alone for a reader who has not read your paper. Those are communication design decisions that require your judgment, not just your approval.

📌 Where AI Genuinely Helps

AI visualisation tools are excellent for exploration . When you are trying to understand your data — looking for patterns, checking distributions, comparing groups — the speed of AI-generated plots is transformative. You can produce twenty different views of your data in the time it would take to hand-code two. The danger comes when you skip the step of turning exploratory plots into carefully designed communication tools before publishing them.

♿️ Accessibility in Data Visualisation

Accessibility in data visualisation is not a nice-to-have — it is a professional obligation. Roughly 8% of men and 0.5% of women worldwide have some form of colour vision deficiency. In a lecture hall of 100 students or a conference room of 50 researchers, the odds are high that someone cannot distinguish the colours in your chart. If your figure relies on colour alone to convey information, you have excluded those readers from understanding your work.

Colourblind-Safe Palettes

The single most impactful change you can make to any AI-generated chart is to switch from the default colour palette to a colourblind-safe alternative. Several well-tested palettes exist:

ColorBrewer palettes: Developed by cartographer Cynthia Brewer, these palettes have been rigorously tested for perceptual uniformity and colourblind safety. The ColorBrewer 2.0 tool lets you select palettes filtered by colourblind-safe options. In matplotlib, use plt.cm.get_cmap('Set2') or the seaborn colorblind palette.
Viridis family: The viridis, magma, inferno, and plasma colour maps in matplotlib were specifically designed to be perceptually uniform and colourblind-safe. They work well for continuous data and are readable in greyscale.
Okabe-Ito palette: An 8-colour categorical palette designed by Masataka Okabe and Kei Ito that is distinguishable under all common forms of colour vision deficiency.

When prompting AI to generate visualisations, explicitly request a colourblind-safe palette. Without this instruction, AI will almost always use the library default, which is rarely optimised for accessibility.

Beyond Colour: Redundant Encoding

The gold standard for accessible visualisation is redundant encoding: using multiple visual channels (colour, shape, pattern, position, line style) to convey the same information. If your scatter plot uses colour to distinguish groups, also use different marker shapes. If your line chart uses colour for different series, also use different line styles (solid, dashed, dotted).

Combine colour with shape (circles, squares, triangles) for scatter plots
Combine colour with line style (solid, dashed, dotted) for line charts
Use direct labelling on or near data elements instead of a separate legend
Ensure sufficient contrast between adjacent elements (minimum 3:1 contrast ratio)
Test your figures in a colourblindness simulator (Coblis or Color Oracle are free tools)

Alt Text for Figures

Every figure in a digital publication should have meaningful alternative text (alt text) that describes what the figure shows for readers using screen readers. This is a basic web accessibility requirement that academic publishing has been slow to adopt, but it is increasingly expected by journals and funding bodies.

Good alt text for a data visualisation should include:

Chart type: "Bar chart showing..." or "Scatter plot comparing..."
Key finding: What the figure demonstrates, not just what it contains
Data summary: The main trend, difference, or pattern visible in the chart
Scale: Approximate range of values, especially if the finding depends on magnitude

AI can help write alt text — describe your figure to Claude or ChatGPT and ask for alt text — but you must verify that the description accurately reflects what the figure actually shows. AI-generated alt text can hallucinate trends or misstate values just as easily as AI-generated prose can hallucinate citations.

⚠️ The 8% You Cannot Ignore

Approximately 8% of men have some form of colour vision deficiency — predominantly red-green colour blindness (deuteranopia and protanopia). This is not a rare condition. In any sizeable academic audience, multiple people will struggle with figures that rely on red-green distinctions. The default colour palettes in matplotlib and many other tools use exactly these problematic colour combinations. Switching to a colourblind-safe palette costs you nothing and includes a meaningful fraction of your audience. There is no good reason not to do it.

🛠️ A Practical Five-Step Workflow

The following workflow takes you from raw data to a publication-ready, accessible figure using AI as your drafting tool. Each step builds on the previous one, and skipping any step risks publishing a figure that misleads or excludes readers.

From AI Draft to Publication-Ready Figure

Explore with AI — generate multiple views. Start by asking the AI to produce several different visualisations of your data. Do not commit to the first chart it gives you. Ask for a bar chart, a box plot, a scatter plot, and a small multiples display of the same data. Compare them. Which one reveals the pattern most honestly? Which one would mislead a reader? This exploratory phase is where AI speed is most valuable. Generate ten plots in ten minutes, then think carefully about which one to develop.
Audit the defaults — check every visual choice. Take the most promising chart and systematically check: Does the axis start at an appropriate value? Are the units labelled? Is the title informative (describing the finding, not just the data)? Is the chart type appropriate for the data? Are there error bars or uncertainty indicators where needed? Does the colour palette serve a purpose, or is it decorative? Would this chart make sense to someone who has not read the surrounding text? Fix every issue you find.
Fix accessibility — colours, encoding, and alt text. Switch to a colourblind-safe palette (ColorBrewer, viridis, or Okabe-Ito). Add redundant encoding: if colour distinguishes groups, also use different shapes or line styles. Check contrast. Write alt text that describes the chart type, the key finding, and the data range. Test with a colourblindness simulator. This step is non-negotiable for professional-quality work.
Polish for publication — fonts, formats, and consistency. Match fonts and sizes to your other figures and to the journal's requirements. Export in a vector format (PDF or SVG) for print, or high-resolution PNG (300+ DPI) for digital. Remove chartjunk: unnecessary gridlines, decorative elements, redundant legends. Ensure the figure caption is complete and accurate. Check that the figure works in greyscale if the journal prints in black and white.
Verify against the data — the final sanity check. Before submitting, verify that the figure accurately represents the underlying data. Check specific values: does the tallest bar correspond to the largest number in your data? Do the axis limits include all data points? Are any outliers hidden by the axis range? This step catches errors introduced during formatting and is especially important when AI has transformed or aggregated data during the plotting process.

📌 Prompting Tip

When asking AI to generate a visualisation, include your quality requirements in the initial prompt. Instead of "plot this data," try: "Create a publication-quality scatter plot of X vs Y. Use the Okabe-Ito colourblind-safe palette. Include axis labels with units, a descriptive title, and error bars for the mean values. Export as SVG." The more specific your prompt, the less fixing you need to do afterward.

📚 Readings for This Session

Core Readings

Dibia (2023). "LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models." arXiv preprint . Read on arXiv — The foundational paper on using LLMs to automate the full visualisation pipeline, from data summarisation through goal generation to chart creation. Introduces the LIDA framework and demonstrates what automated visualisation can and cannot achieve.
Wilke (2019). Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O'Reilly Media. Read online (free) — The best practical guide to data visualisation principles for researchers. Covers chart selection, colour use, axis design, and annotation with clear examples of what works and what does not. The full text is freely available online.
ColorBrewer 2.0. colorbrewer2.org — The essential tool for selecting colourblind-safe, print-friendly colour palettes. Originally designed for cartography but widely used across all data visualisation. Experiment with different palette types (sequential, diverging, qualitative) and filter for colourblind safety.

Supplementary Resources

Tufte, E. (2001). The Visual Display of Quantitative Information. 2nd ed. Graphics Press. — The classic text on visualisation design. Introduces the concepts of chartjunk, data-ink ratio, and small multiples that remain central to good visualisation practice.
Coblis Colour Blindness Simulator: color-blindness.com/coblis — Upload your figures and see how they appear under different forms of colour vision deficiency. A quick and free way to test accessibility before publication.
Seaborn colourblind palette documentation: When using seaborn in Python, sns.set_palette("colorblind") switches to a colourblind-safe default. This single line of code addresses the most common accessibility failure in AI-generated charts.

Key Takeaways

AI makes chart generation fast, and modern tools can critique their own output. Tools like Claude Code can generate a visualisation, view the rendered result, identify problems, and iterate — all without manual code-pasting. With good prompting, you can get close to publication quality in one conversation. But "close" and "publication-ready" are still different things, and the gap increasingly lives in communication design decisions rather than technical defaults.

Six common failures to watch for. Misleading axes, poor colour choices, overplotting, wrong chart types, missing context, and chartjunk are the most frequent problems in AI-generated visualisations. Knowing these failure modes means you know exactly what to check every time AI gives you a chart.

Accessibility is not optional. With approximately 8% of men affected by colour vision deficiency, any figure that relies on colour alone to convey information excludes a substantial portion of your audience. Colourblind-safe palettes, redundant encoding, and alt text are baseline requirements for professional work.

Use AI as a collaborative design partner, not just a code generator. The five-step workflow — explore, audit, fix accessibility, polish, verify — can now happen in conversation with the AI rather than purely after it. Ask Claude Code to critique its own charts, specify quality requirements upfront, and iterate in dialogue. The AI handles the syntax and can flag many technical issues; you handle the scientific and communication design decisions that no tool can make for you.

Next session: Sub-Lesson 4 brings together everything from this week with hands-on activities and your weekly assessment. You will practice the full workflow from data to publication-ready figure, audit AI-generated visualisations for the problems discussed here, and build the habits that distinguish effective data communication from pretty pictures.